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Dealing  with  Uncertainty  about  Item  Parameters: 
Expected  Response  Functions 


Abstract 

It  is  a  common  practice  in  item  response  theory  (IRT)  to  treat  estimates  of 

A 

item  parameters,  say  B,  as  if  they  were  the  known,  true  quantities,  B. 
However,  ignoring  the  uncertainty  associated  with  item  parameters  can  lead 
to  biases  and  over-confidence  in  subsequent  inferences  such  as  ability 
estimation,  especially  when  item-calibration  samples  are  snail.  This  paper 
demonstrates  how  to  incorporate  uncertainty  about  B  with  Lewis’s 
“expected  response  functions”  (ERFs),  pointwise  expected  values  of  item 
response  conditional  on  examinee  proficiency  averaged  over  posterior 
distributions  of  item  parameters.  This  paper  presents  ERFs,  outlines 
procedures  for  computing  them  and  using  them  in  practical  work,  and  gives 
an  illustration  with  data  from  the  National  Assessment  of  Educational 
Progress.  Advantages  of  approximating  ERFs  response  curves  with 
members  of  familiar  parametric  families  of  IRT  curves  are  noted. 

Key  words:  Bayesian  estimation,  expected  response  functions,  item 
response  theory,  multiple  imputation,  pseudolikelihood 
estimation 


Introduction 


Item  response  theory  (IRT)  models  posit  that  an  examinee’s  chances  of  correctly 
answering  test  items  depend  on  an  unobservable  parameter  for  that  examinee  (ff)  and  for 
each  of  the  items  (fij,  for  j=  1,. It  is  common  to  estimate  the  item  parameters  from  the 
response  of  a  “calibration  sample”  of  examinees,  then  treat  the  estimates  B  =  j  as 

if  they  were  true  parameter  values  in  subsequent  inferences  such  as  estimating  examinees’ 
proficiency  parameters.  Tsutakawa  and  Johnson  (1990)  found  that  ignoring  uncertainty 
about  3-parameter  logistic  (3PL)  item  parameters  from  a  calibration  sample  of 400  led  to 
biased  posterior  means  for  0s  and  understatement  of  posterior  standard  deviations  by  more 
than  40-percent  on  the  average. 

Approaches  that  take  uncertainty  about  B  into  account  include  a  second-order 
Taylor  series  expansion  with  an  asymptotic  normal  approximation  for  p(B)  (Tsutakawa  & 
Soltys,  1988;  Tsutakawa  &  Johnson,  1990),  numerical  integration  over  a  normal 
approximation  (Jones,  Wainer,  &  Kaplan,  1984),  multiple  imputation  (Mislevy  &  Yan, 
1991),  and  Gibbs  sampling  (Albert,  1992).  This  paper  presents  approximations  based  on 
Lewis’s  (1985)  notion  of  “expected  response  functions”  (ERFs),  pointwise  expected 
values  of  item  response  conditional  on  0  as  averaged  over  posterior  distributions  of  item 
parameters.  (See  Mislevy,  Sheehan,  &  Wingersky,  1993,  on  the  use  of  ERFs  in  IRT  test 
equating  when  information  about  item  parameters  is  limited.) 

The  following  section  describes  the  problem  and  reviews  previous  solutions.  ERFs 
and  computing  approximations  are  then  given.  Their  use  is  illustrated  with  data  from  the 
National  Assessment  of  Educational  Progress. 

Background  and  Notation 


Item  Response  Theory 

This  paper  confines  discusssion  to  scalar  parametric  IRT  models  for  dichotomous 
(right/wrong)  test  items,  but  the  ideas  can  be  extended  to  more  complex  models.  Define 
Fj(ff),  the  item  response  function  for  Item  j,  as  follows: 

F;(0)  =  Prob(x.=ll  d,Pj), 


(1) 
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where  Xj  is  the  response  to  Item  j,  1  for  right  and  0  for  wrong,  6  is  the  examinee 
proficiency  parameter,  and  fy  is  the  (possibly  vector-valued)  parameter  for  Item  j.  For 
example,  under  the  3-parameter  logistic  (3PL)  model, 

where  'F  is  the  logistic  distribution  'P(r)=[l+exp(-z)]*1  and  fys(aj,bj,Cj)  (Lord,  1980). 
The  density  p(xy|f?,/Jy)  is  thus  F y(0)  ifxj=l  and  1-Fy(0)  if  x/=0.  Under  the  usual  IRT 

assumption  of  conditional  independence,  the  probability  of  a  vector  of  responses 
x=(;ci,...,xn)  to  n  items  is  the  product  over  items  of  terms  based  on  (1): 

p(x!  0,  B)  =  p(xy  1 9,f}j ) 

T  <2> 

-nwfwp* 


Equation  2  is  the  basis  for  estimating  an  examinee’s  6.  Suppose  x  and  B  were 
known.  For  maximum  likelihood  estimation,  one  finds  the  value  of  6  that  maximizes  (2), 

A 

namely,  the  MLE  0 .  The  asymptotic  variance  of  the  MLE  is  the  inverse  of  the  Fisher 
information  function,  which  is  a  sum  of  contributions  over  items: 


'_d_ 

JO 


-a 


F j(0) 


vaWeie.B)  *  X-Lrv  •  J  xv 
1  '  rF,(0)[l~Fy(0)] 


(3) 


For  Bayesian  inference,  if  p(0)  represents  prior  knowledge  about  an  examinee’s 
proficiency  before  x  is  observed,  then  knowledge  posterior  to  the  observation  is  obtained 
by  Bayes  theorem  as 


p(xl0,B)p(0)<?0 

< 


(4) 


The  posterior  mean  and  variance  are,  respectively, 

E(0lx,B)  =  jd  p(0lx,B)  d9  (5) 


and 
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Var(0lx,B)  =  j$2  p(01x,B)  dd  -  [J  d  p(01x,B)  ddj.  (6) 

Uncertainty  About  Item  Parameters 

Equations  2  through  6  are  written  as  conditional  on  B.  It  is  common  to  evaluate 
such  expressions  using  a  point  estimate  of  B,  or  B,  as  obtained  for  example  from  die 
responses  Xcalib  =  (xi,...,xw)  of  a  calibration  sample  of  N  examinees.  For  example,  the 

Bayes  modal  estimate  of  B  when  p(6)  is  known  maximizes  the  posterior  distribution  for  B, 

p(bix.J=  p(*^ib)P(b)  -  nj  p(x,ie.B)p(e)aeP(B),  a) 

•«1 

where  p(B)  expresses  prior  knowledge  about  B  (e.g.,  Mislevy,  1986,  Tsutakawa, 

1984) — perhaps  uninformative,  perhaps  based  on  items’  content  or  skill  requirements, 
expert  judgments,  or  experience  with  similar  items  (Mislevy,  Sheehan,  &  Wingersky, 
1993).  In  large  samples,  the  posterior  distribution  can  be  approximated  by  a  multivariate 

A 

normal  distribution  with  mean  B  and  variance 


A 

Values  B  and  for  an  approximation  could  be  obtained,  for  example,  as  maximum 
likelihood  or  Bayesian  modal  estimates  and  asymptotic  covariance  matrix  from  Mislevy  & 
Bock’s  (1983)  BILOG  program,  as  illustrated  in  the  NAEP  example  below.  In  the  sequel, 
we  simply  use  p(B)  to  stand  for  knowledge  about  B  at  a  given  point  in  time,  regardless  of 
its  source.  Note  that  p(B)  need  not  incorporate  independence  over  items. 

As  Tsutakawa  et  al.  demonstrate,  ignoring  the  uncertainty  about  B  (by  treating  B  as 
B)  can  lead  to  biases  and  understated  uncertainties  in  subsequent  inferences  about  ft. 
Incorporating  this  kind  of  uncertainty  into  analyses  is  straightforward  from  a  Bayesian 
perspective:  Marginalize  with  respect  to  partially-known  quantities.  For  example,  the  so- 
called  “marginal  likelihood  function”  takes  uncertainty  about  B  into  account  in  the 
likelihood  function  by  integrating  (2)  with  respect  to  p(B): 
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p(*]0)  =  EB[p(xl0,B)] 

=  J  p(xl0,B)p(B)<?B 

=  jFIp(^10>i3y)p(B)'?B 

-fflw’MiMTp®* 

y-i 


effectively  the  average  of  (2)  over  all  possible  values  of  B,  each  weighted  by  its  probability 
given  the  information  from  the  calibration  sample.  More  generally,  if  G(B)  is  any 
expression  involving  item  parameters,  then 

E„[G(B)]  =  jG(B)p(B)<*.  (10) 


Alternative  Approaches 

Closed-form  solutions  of  (10)  are  not  generally  forthcoming  in  IRT.  Before 
introducing  expected  response  functions,  we  briefly  review  three  alternatives:  a  second- 
order  analytic  approximation,  multiple  imputation,  and  Gibbs  sampling.  The  discussion  of 
multiple  imputation  is  more  detailed,  because  the  ERF  approximation  shares  intermediary 
steps  with  multiple  imputation  and  the  NAEP  example  compares  numerical  results  from  the 
two  approaches. 

Tsutakawa’s  second-order  expansion  uses  an  approximation  due  to  Lindley  (1980): 

E,[0(B)]-0(B)+iXG»S».  (ID 

where  Grs  is  the  rvy*  element  of  <?2[G(B)]/<3B<3B'  and  Zrs  is  the  element  of  £b,  with 
r  and  s  indexing  elements  of  B.  When  calculating  an  examinee’s  posterior  mean  (5),  for 
example,  G(B)  is  J#  p(01x,B)  36.  Because  such  approximations  would  be  exact  if  p(B) 

were  MVN(B  ,Eb),  their  performance  in  (10)  depends  on  the  accuracy  of  the  asymptotic 
normal  approximation  to  p(B) — which  is  often  satisfactory  in  practice  since  even  the  usual 

A 

first-order  approximation  G(  B  )  suffices  when  the  calibration  sample  is  large  and  p(B(X) 

A 

is  concentrated  around  B  .  An  impediment  to  using  (1 1)  in  practical  work  is  that 
derivatives  must  be  calculated  for  each  function  G  to  which  it  is  applied. 
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Albert  (1992)  employed  Gibbs  sampling  (Gelfand  &  Smith,  1990)  to  obtain  a 
discrete  approximation  to  the  joint  posterior  distribution  of  B  and  the  vector  of  examinee 
abilities  0  under  the  2-parameter  normal  (2PN)  IRT  model  From  vectors  B(0  and  0(0 
that  approximate  B  and  0 ,  one  obtains  a  subsequent  approximation  by  drawing  B(t+1) 
from  p(BI0  =  0(O,Y),  then  drawing0(t+1)  from  p(@IB  =  B('+1),.Y).  From  initial 

approximations,  repeated  cycles  achieve  (under  regularity  conditions)  a  stochastic 
convergence  such  that  a  (03)  draw  obtained  in  this  manner  is  essentially  a  draw  from  the 
correct  posterior  p(0,BIY).  Widely  spaced  draws  from  a  sequence  which  has  attained 

convergence  (or,  better  still,  from  separate  sequences  initiated  from  different  starting 
points;  see  Gelman  &  Rubin,  1992)  are  essentially  independent  draws  from  p(0,BIAT). 
Evaluating  any  function  G(03)  of  the  parameters  with  respect  to  each  of  these  draws 
constitutes  a  discrete  approximation  of  its  posterior  distribution.  (This  last  idea  will  be 
illustrated  below  with  multiple  imputation.)  In  particular,  the  discrete  approximation  of 
p(B)  can  serve  as  a  basis  for  calculating  expected  response  functions.  Gibbs  sampling  is 
much  more  computationally  intensive  than  the  other  approximations  described  in  this  paper. 

Multiple  imputation,  introduced  by  Rubin  (1987)  to  handle  missing  responses  in 
sample  surveys,  creates  pseudo  datasets  with  draws  from  the  posterior  distributions  of 
missing  data,  and  combines  the  results  of  standard  analyses  of  pseudo  data  sets  so  as  to 
incorporate  the  uncertainty  that  missingness  engenders.  B  plays  the  role  of  missing  data  in 
the  problem  of  imperfect  knowledge  about  item  parameters  (Mislevy  &  Yan,  1991). 
Suppose  that  if  B  were  known,  we  could  calculate  the  posterior  mean  and  variance  of 
G(B),  say,  G(B)  and  V(B).  An  example  again  would  be  the  posterior  mean  and  variance 
for  an  examinee’s  6  via  (5)  and  (6).  The  steps  for  multiple-imputation  approximations  of 
the  posterior  mean  and  variance  that  take  uncertainty  about  B  into  account,  say,  G  and  V , 
are  outlined  below.  The  reader  is  referred  to  Rubin  (1987)  for  theoretical  justification. 

1 .  Obtain  the  posterior  distribution  for  B,  p(B)  (e.g.,  the  multivariate  normal 
approximation  MVN(B,Ib)  used  in  the  following  NAEP  example). 

2.  Draw  K  item  parameter  vectors  from  p(B),  say  B*  for  k=l,...JC. 

3 .  For  each  k,  calculate  the  posterior  mean  and  variance  conditional  on  B=B&,  denoted 
G{Bk)  and  V(B*). 

4 .  The  posterior  mean  for  G,  accounting  for  uncertainty  about  B,  is  approximated  by 
the  average  of  the  K  conditional  posterior  means: 
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S=K-‘XS(B,),  (12) 

k 

5 .  The  posterior  variance  for  G,  accounting  for  uncertainty  about  B,  is  approximated 
by  the  sum  of  two  terms: 

V  =  U  +  *pV,  (13) 

where  the  first, 

k 

approximates  the  variance  that  would  exist  even  if  B  were  known  with  certainty, 
and  the  second, 

v=(/sr-i)-‘X[5(B,)-Gf 

quantifies  additional  uncertainty  due  to  not  knowing  B. 

Example:  Data  from  NAEP 

We  shall  use  a  running  example  with  data  from  the  National  Assessment  of 
Educational  Progress  (NAEP):  responses  to  19  items  from  100  8-  and  13-year  old  students 
who  participated  in  the  1986  and  1988  mathematics  trend  assessment.  Table  1  gives 
descriptive  statistics  and  Bayesian  posterior  modal  estimates  B  =  {a,b,c]  obtained  with 

Mislevy  and  Bock’s  (1983)  BILOG  computer  program.  Table  2  gives  the  accompanying 
approximation  of  the  posterior  covariance  matrix  ZB .  Covariances  among  the  three 
parameters  for  the  same  item  can  be  quite  high,  but  relationships  among  parameters  for 
different  items  are  uniformly  much  lower. 

[[Tables  1  &  2  about  here]] 

A  practical  problem  in  applying  multiple  imputations  is  to  determine  the  value  of  K 
that  provides  the  desired  accuracy,  which  may  differ  with  the  target  G.  In  the  NAEP 
example,  Mislevy  and  Yan  (1991)  calculated  examinees’  posterior  means  and  variances 
with  /sT=10, 100,  and  1000.  AT=10  proved  stable  for  estimating  posterior  means,  but  not 
for  posterior  variances,  which  were  stable  with  £=100.  Results  for  AT=100  and  £=1000 
were  indistinguishable.  We  use  the  £=100  results  below  as  a  baseline  comparison  for 


Expected  response  functions 
Page  7 


corresponding  estimates  calculated  with  ERFs.  The  dotted  lines  in  Figure  1  illustrate  the 
item  response  functions  for  four  items  from  the  NAEP  example  that  correspond  to  100 
draws  of  B.  (The  solid  and  dashed  lines  will  be  discussed  below).  These  graphs  depict 
the  nature  and  magnitude  of  uncertainty  about  item  response  functions,  but  not  the  mild  co¬ 
relationship  among  the  curves  induced  by  the  nonzero  inter-item  covariances. 

[[Figure  1  about  here]] 

Expected  Response  Functions 


Definition 

In  dichotomous  IRT  models,  the  expected  value  of  a  correct  response  to  Item  j 
given  0  and  B  is  Fy(0)sP(jty=ll0,/^).  If  /3y  is  only  partially  known,  through  p(B),  the 
probability  of  a  correct  response  conditional  on  6  but  marginal  with  respect  to  B  can  be 
written  as 


[?><»)] 

=  Jp(x,=lie,£.)p(B)5B  (14) 


an  “expected  response  function”  that  gives  the  probability  of  correct  response  conditional 
on  0  taking  into  account  uncertainty  about  B  (Lewis,  1985). 

Even  though  F*  is  the  expected  value  of  a  correct  response  at  each  value  of  6,  it  is 
not  the  same  as  Fy(0)  evaluated  with  the  expected  value  of  fy.  This  can  be  seen  in  Figure 
1,  which  shows  expected  response  functions  (dashed  lines)  for  the  four  items  from  the 
NAEP  example,  along  with  the  curves  that  correspond  to  Fy(0)  as  evaluated  with  the  point 
estimate  fy  (solid  lines).  In  particular,  the  ERF  is  generally  flatter. 

The  shape  of  F*  depends  on  the  shape  of  Fy  and  the  character  of  p(/Jy).  In  general, 
F’  and  Fy  will  not  be  of  the  same  functional  form.  Lewis  (1985)  shows  that  if  Fy  were 
2PN  and  p(j3y)=p(ay,£>y)  were  bivariate  normal,  then  F*  would  be  a  2-parameter  ogive  with 
a  Student’s  t  shape.  Its  location  parameter,  b',  would  have  the  same  value  as  the  Bayes 
mean  estimate  for  bj,  or  fy  ,  but  its  slope  parameter,  a] ,  would  be  attenuated  from  the 
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Bayes  mean  estimate  for  ay.  A  simpler  result  is  obtained  if  ay  is  known  with  certainty  a 
priori.  If  p(6y)  is  N (bj,Oj),  then  F*  is  also2PN,  with  b'-bj  and 

Approximation  with  ERFs 

ERFs  serve  as  a  potential  basis  for  taking  uncertainty  about  B  into  account,  by 
replacing  occurrences  of  Fys  with  F*  s  in  functions  of  interest  G(B).  As  examples, 

consider  the  following: 

Likelihood  estimation  of  6  proceeds  by  maximizing  an  ERF-based  analogue  of  the 
likelihood,  namely 

p>ie)3r[F;(e)''[i-F;(0)p. 

>"  (15) 

One  way  to  justify  maximizing  p*  (jcl  Q )  is  to  view  it  as  an  approximation  of  the  marginal 
likelihood: 

P(*ie)=E,[p(*i0,B)] 

/*> 

=  J-JnF;(8)''[l-F/e)pp(ftlft., . A)W, 

im  i 

j~  i 

= n f,.  <*r[i  -  F/flp  P  wap, 

j*  i 

y*i 

Sp(xl0). 

The  step  in  which  the  approximation  occurs  replaces  each  p(AIA_j»-*’»A)  with 
p(A)-  Thus,  if  the  information  about  items  is  independent — that  is,  p(B)=Il  p (fij) — the 
result  is  exact.  Likelihood  and  Bayesian  inferences  about  0that  take  uncertainty  about  B 
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into  account  exhibit  in  this  case  the  same  conditional  independence  form  as  when  item 
parameters  are  known.  In  particular,  applying  standard  procedures  for  known  item 
response  functions  to  obtain  MLEs  and  asymptotic  variances  (3),  but  with  F*  s  in  place  of 

Fjs,  gives  the  correct  results.  Independent  posteriors  for  items  can  be  assured  or  closely 
approximated  by  coupling  special  item-calibration  sampling  designs  and  test  construction 
designs;  the  idea  is  for  the  items  appearing  in  a  test,  the  sets  of  examinees  in  the  calibration 
sample  responding  to  each  of  them  were  completely  or  nearly  disjoint.  For  example, 
randomly  equivalent  calibration  samples  of  examinees  can  be  administered  disjoint  blocks 
of  items,  and  operational  test  forms  can  be  built  with  items  from  different  blocks. 

A  second  justification  applies  even  if  p(B)  is  not  independent  over  items.  Although 
the  dependencies  among  items  are  ignored,  (15)  is  an  example  of  what  Arnold  and  Strauss 
(1991)  call  a  “pseudo-likelihood”  (see  Appendix);  under  regularity  conditions  on  the  F*s, 

its  maximum  is  a  consistent  estimator  of  0.  Thus  likelihood  point  estimates  of  0  based  on 

(15)  tend  to  have  the  correct  central  tendency.  Applying  the  standard  MLE  variance 
formula  (3)  with  F*  s  tends  to  give  too  optimistic  of  an  impression  of  the  uncertainty  about 

6s,  however.  But  if  the  dependencies  among  items  are  small — and  they  tend  toward  zero 
in  long  tests  (Mislevy  &  Sheehan,  1989) — the  degree  to  which  this  value  understates 
uncertainty  will  also  be  small. 

Bayesian  inference  about  0  can  employ  the  above  approximation  p*(xl0)  for 
likelihoods.  The  posterior  distribution  for  6  is  thus  approximated  as 

p-(9l*)=  r?>'g)p(9)  , 

Jp  (xi$)  pie)  de 


and  the  posterior  mean  and  variance  are  approximated  as 

E(0lx)  =  Jj0p(0lx,B)  000B 
« |0p*(0lx)  00 


(16) 


and 


Var(0lx)  =  J  J  02  p(0lx, B)  00  -  [ J  0  p(0lx,B)  00]*0B 
-  jO2  p*(0lx)  00-  [J0  p*(0!x)  00]2. 


(17) 
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Again  the  approximations  are  exact  if  p(B)  is  independent  over  items,  and  indicators  of 
uncertainty  tend  to  be  optimistic  to  the  extent  that  dependencies  among  items  are 
nonnegligible.  Some  numerical  results  on  this  point  appear  in  the  NAEP  example. 

The  test  characteristic  function  is  the  expected  number-correct  score  on  a  test  of  n 
items  as  a  function  of  0.  Mislevy,  Sheehan,  &  Wingersky  (1993)  obtained  test 
characteristic  functions  with  ERFs,  in  order  to  equate  tests  with  sparse  item-calibration 
data.  IRT  true-score  equating  determines  number-right  (or  formula)  scares  on  different 
tests  that  correspond  to  the  same  values  of  0  (Lord,  1980).  The  expected  number-right 
score  on  Test  A  for  an  examinee  with  proficiency  0  is  obtained  as 

*»(#)=  £p(*  =  lie,ft)  =  XF,(«).  08) 

/«Ta 

where  Ta  is  the  set  of  indices  of  items  that  appear  in  Test  A.  The  expected  score  on  Test 
B,  is  defined  analogously.  A  score  on  Test  A  and  a  score  on  Test  B  are  “true-score 
equated”  if  they  are  the  respective  expected  scores  of  the  same  value  of  0. 

When  knowledge  about  B  is  imperfect,  one  must  equate  scores  that  are  expectations 
conditional  on  0  but  marginal  with  respect  to  p(B),  rather  than  expected  scores  conditional 
on  0  and  B.  The  expected  true  score  on  Test  A  given  0  under  these  circumstances  is  thus 

<(»)*=  e.K(«)]=  IJp(j=iieA)p(^)^  =  (») 

This  is  simply  the  sum  of  the  probabilities  of  correct  response  item  by  item,  whether  or  not 
p(B)  is  independent  over  items.  A  score  on  Test  A  and  a  score  on  Test  B  are  “ expected 
true-score  equated”  if  they  are  the  respective  expected  scores  of  the  same  value  of  0,  as 
defined  by  (19).  Because  only  expected  scores  are  needed  for  this  equating  method,  the 
expected  test  characteristic  curves  obtained  in  (19)  are  correct  whether  or  not  the  posteriors 
for  individual  items  are  independent. 

Computing  Approximations 

As  noted  above,  closed-form  solutions  for  F*  are  not  generally  available.  This 

section  describes  how  to  use  multiple-imputations  or  Gibbs-sampler  discrete  estimates  of 
p (fij)  to  estimate  F*  point  by  point  across  a  grid  of  0  values  for  each  item.  Because  only 

p(/3 j)  is  involved  for  Item  j,  not  the  posteriors  for  other  items,  this  process  can  be  earned 
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out  independently  over  items.  Subsequent  inferences  about  6  can  be  drawn  using  these 
points  in  a  discrete  approximation  of  the  6  distribution  and  the  response  curve,  or  a  smooth 
curve  can  be  fit  to  the  probabilities  thus  obtained. 

There  are  operational  advantages  to  using  the  closest  curve  from  a  familiar  family  to 
approximate  F* — for  example,  the  closest  3PL  curve  in  applications  based  on  the  3PL 
model,  or  the  closest  2PL  model  in  applications  based  on  the  1PL  or  2PL.  Let  F**  denote 
such  an  approximation.  This  expedient  makes  it  possible  to  use  standard  off-the-shelf 
software  designed  for  popular  parametric  ERT  models  to  estimate  examinee  scores, 
construct  tests,  or  draw  equating  lines.  If  additional  information  about  item  parameters 
becomes  available  over  time,  as  might  occur  as  examinee  responses  are  acquired  over  time 
in  operational  testing,  it  can  be  incorporated  into  the  system  by  merely  updating  item 
parameter  values  under  the  same  model.  If  the  IRT  model  were  correct  and  the  response 
function  were  stable  over  time,  the  sequence  of  expected  response  curves  would  converge 
toward  the  closest  member  of  the  family  to  the  true  curve — to  the  true  curve  itself,  if  it  were 
a  member  of  the  family. 

We  now  describe  the  operational  procedures  we  have  used  for  applied  work  with 
ERFs.  The  expected  response  function  for  a  particular  item,  F* ,  is  approx,  ted  as 

follows: 

1 .  Obtain  an  estimate  of  the  posterior  distribution  p(j3y).  As  noted  above,  this  is 

usually  based  on  a  calibration  sample  of  examinee  responses — say,  MVN^.E^  j 

with  parameter  estimates  from  BILOG —  but  it  may  also  be  based  partly  or  wholly 
on  collateral  information  about  items  such  as  content  specifications  and  cognitive 
processing  requirements  (Mislevy,  Sheehan,  &  Wingersky,  1993). 

2 .  Specify  a  grid  of  M  theta  values  across  the  ability  range  of  interest.  Let  0„  denote 
the  m*  grid  point. 

3 .  Draw  K  item  parameter  vectors  from  p(  ).  Let  f5jk)  be  the  it*  such  draw. 

4.  For  each  of  the  K  sets  of  item  parameters,  determine  P£\  the  probability  of  a 

correct  response  to  Item/'  at  0„,  where  =  p{x}  =11 6  =  &M,pj  =  P]k)). 


5. 
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Compute  the  expectation  at  each  point  Gm  by  averaging  die  probabilities  obtained  in 
Step  4: 

km  | 

We  refer  to  the  collection  of  points  {(9,,,  F}(8m)):  as  the 

"nonparametric"  expected  response  function  because  it  does  not  assume  any  particular 
parametric  form. 


For  applied  work,  it  may  be  convenient  to  approximate  the  nonparametric  ERF  with 
a  continuous  approximation  F*\  say  a  spline  or  a  close-fitting  2PL  or  3PL  curve.  The  us<” 

of  a  3PL  will  be  illustrated  below.  Maximum  likelihood  estimates  for  the  3PL  item 
parameters  /?*'  =  )  that  best  approximate  F*  are  found  by  maximizing 


ma]  v  J 


(20) 


over  the  Af-point  theta  grid,  where  Wm  is  a  weight  that  specifies  the  relative  importance  of 
fitting  F**  at  6n.  For  example,  weights  may  be  selected  to  simulate  a  rectangular 

distribution  of  examinees  or  a  normal  distribution  of  examinees.  The  maximum  may  be 
obtained  iteratively  by  using  Newton’s  method  to  obtain  successive  collections  to  the 
parameter  estimates.  We  refer  to  the  solution  as  a  “fitted”  expected  response  function. 

Example  (continued) 

The  BILOG  calibration  of  the  19  previously-described  NAEP  items  with  100 
examinees  provided  the  posterior  mode  estimates  (ay. ,  bj ,  Cj  j  and  the  corresponding  large- 

sample  approximation  of  the  covariance  matrix  discussed  above.  Due  to  range  restrictions 
on  the  a's  and  c’s,  we  worked  with  a  multivariate  normal  (MVN)  approximation  far  the 
posterior  of  =  (log(ay),fy,logit(cy)),  where  logit(c/)=log[c/(  1  -cj)] .  p(/Jy)  was  thus 

approximated  as  MVN  with  mean  vector  =  (log(a, ),  bj ,  log  it(c, ))  and  covariance  matrix 
'Lpj  obtained  through  the  delta  method  from  the  covariance  matrix  for  the  un transformed 

parameters.  Nonparametric  and  fitted  3PL  ERFs  were  calculated  for  each  item.  Figure  2 
presents  results  for  the  four  items  which  previously  appeared  in  Figure  1.  The 
nonparametric  ERFs  were  obtained  using  100  draws  from  p (fy)  and  a  grid  of  31  evenly- 

spaced  6  values  ranging  from  -3  to  +3  in  steps  of  .2.  The  fitted  curves  employed  a 
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standard  normal  weighting  function  over  the  same  range.  The  item  response  functions  that 

a 

correspond  to  Fj(6)  evaluated  with  the  point  estimate  are  also  plotted  for  comparison. 

These  curves  are  noticeably  steeper  than  the  two  expected  response  curves.  Thus,  one 
effect  of  ignoring  uncertainty  about  item  parameters  is  a  tendency  to  inflate  belief  about  the 
discriminating  power  of  an  item. 


[[Figure  2  about  here]] 

For  most  of  the  19  items,  the  3PL  approximation  captured  the  nonparametric 
approximation  quite  well.  The  only  discrepancies  encountered  were  for  items  with  fairly 
high  a’s,  such  as  Item  19.  For  these  highly  discriminating  items,  the  fitted  curves  tended  to 
be  slightly  flatter  than  the  nonparametric  curves.  The  discrepancies  were  slightly  more 
pronounced  when  the  ERFs  were  recalculated  with  a  rectangular  weighting  function, 
indicating  that  they  are  related  to  the  inability  of  the  3PL  form  to  capture  the  pattern  of 
curvature  in  the  tails  of  the  theta  range. 

Figure  3  presents  a  comparison  of  results  regarding  Bayesian  inference  about  6  for 
a  sample  of  100  students.  The  plots  show  posterior  means  and  associated  posterior 
standard  deviations  (PSDs)  calculated  using  point  estimates  of  the  item  parameters, 
nonparametric  ERFs,  and  fined  ERFs.  In  each  case,  the  multiple  imputation  solution 
(Equations  12  and  13)  is  employed  as  a  standard  of  evaluation,  as  it  is  nonparametric  and 
accounts  for  dependencies  among  the  parameters  of  different  items.  As  can  be  seen,  the 
various  methods  for  handling  uncertainty  about  have  had  negligible  effect  on  the 

calculation  of  posterior  means.  However,  the  effect  on  the  associated  PSDs  is  quite 
pronounced.  As  would  be  expected,  the  practice  of  using  point  estimates  of  item  parameters 
as  if  they  were  known  true  values  seriously  understates  the  uncertainty  associated  with 
examinees’  9s.  This  effect  is  less  pronounced  when  ERFs  are  used.  Table  3  presents 
average  PSDs  calculated  for  the  multiple  imputation  approach,  the  nonparametric  and  fitted 
ERFs,  and  the  point  estimates.  In  this  example,  the  PSD  of  a  typical  examinee’s  6,  when 
calculated  using  point  estimates  of  the  item  parameters,  was  understated  by  about  10%. 

This  can  be  attributed  to  ignoring  uncertainty  about  B  altogether.  For  the  nonparametric 
and  fitted  ERFs  the  understatement  was  only  3.6%  and  3.9%  respectively.  This  is 
obtained  by  incorporating  uncertainty  about  B  item  by  item,  but  ignoring  dependencies 
across  items.  In  terms  of  variance,  about  60%  of  the  typically-ignored  variance  was 
accounted  for  in  this  example  through  the  use  of  ERFs. 
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[[Figure  3  about  here]] 

[[Table  3  about  here]] 

Conclusion 

As  increasingly  ambitious  applications  push  item  response  theory  closer  to  the 
boundaries  of  its  applicability,  increasingly  strenuous  efforts  are  required  to  deal  with 
issues  of  uncertainty,  both  as  to  model  fit  and  knowledge  of  parameters  within  die  model. 
This  paper  addresses  a  problem  of  the  latter  type,  namely,  dealing  with  uncertainty  about 
item  parameters.  Fortunately,  statisticians’  recent  interest  in  numerical  and  Bayesian 
approaches  to  such  problems  provide  a  variety  of  tools,  each  with  their  own  strengths  and 
weaknesses  to  be  matched  with  the  purposes  and  characteristics  of  applications.  Expected 
response  functions  (ERFs)  account  for  uncertainty  that  is  usually  ignored  in  a  way  that 
allow  us  to  employ  familiar  formulas  for  known  item  response  functions — even  to  apply 
the  same  formulas  but  with  attenuated  parameter  estimates.  This  would  be  especially 
convenient  in  item-banking  and  adaptive-testing  applications,  in  which  tests  are  assembled 
from  collections  of  pre-calibrated  items.  Uncertainty  about  item  parameters  (under  the 
assumed  model')  would  be  implicit  in  the  parameter  estimates  available  at  a  given  point  in 
time,  no  additional  steps  would  be  required  at  the  point  of  calculating  scores  for  individual 
examinees,  and  improved  knowledge  about  item  parameters  would  merely  require  updating 
a  file  of  ERF  parameters. 
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Table  1 

Statistics  and  Point  Estimates  of  Item  Parameters  (a,6,c) 
for  19  NAEP  Mathematics  Items 


% 


Item 

Conect 

r-bis 

A 

a 

6 

A 

C 

1 

.78 

.35 

.39 

-1.59 

.20 

2 

.92 

.63 

.90 

-1.98 

.20 

3 

.78 

.45 

.55 

-1.23 

.19 

4 

.85 

.45 

.77 

-1.45 

.20 

5 

.79 

.45 

.63 

-1.17 

.20 

6 

.91 

.47 

.54 

-2.60 

.20 

7 

.65 

.65 

1.20 

-26 

.17 

8 

.86 

.64 

.99 

-1.37 

.18 

9 

.72 

.62 

1.22 

-.50 

.19 

10 

.67 

.61 

1.27 

-.26 

.20 

11 

.48 

.56 

1.96 

.53 

.23 

12 

.77 

.44 

.60 

-1.06 

.20 

13 

.85 

.59 

.95 

-1.30 

.19 

14 

.51 

.69 

1.89 

.20 

.15 

15 

.55 

.49 

.86 

.19 

.18 

16 

.43 

.41 

.65 

.81 

.16 

17 

.30 

.56 

1.10 

1.04 

.12 

18 

.53 

.44 

2.59 

.56 

.30 

19 

.21 

.63 

3.03 

1.09 

.10 
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Table  2 

Variances  and  Covariances  of  Item  Parameter  Estimates 
for  19  NAEP  Mathematics  Items 


Item 

Var(a) 

Cov(a,b) 

Var(b) 

Cov(a,c) 

Cov(b,c) 

Vaifc) 

1 

.059 

.233 

1.212 

.001 

.027 

.008 

2 

.435 

.632 

1.086 

.003 

.012 

.008 

3 

.069 

.141 

.509 

.002 

.019 

.008 

4 

.264 

.320 

.536 

.004 

.016 

.008 

5 

.119 

.174 

.401 

.004 

.020 

.008 

6 

.078 

.325 

1.673 

.001 

.017 

.008 

7 

.300 

.055 

.073 

.011 

.010 

.006 

8 

.208 

.179 

.251 

.003 

.011 

.007 

9 

.259 

.094 

.108 

.010 

.012 

.007 

10 

.339 

.074 

.077 

.016 

.012 

.007 

11 

2.513 

.056 

.058 

.053 

.008 

.006 

12 

.114 

.181 

.527 

.004 

.021 

.008 

»  13 

.280 

.261 

.354 

.004 

.012 

.007 

14 

1.519 

.073 

.041 

.034 

.007 

.004 

15 

.203 

.037 

.132 

.011 

.015 

.007 

16 

.118 

-.043 

.201 

.009 

.014 

.006 

17 

.366 

-.075 

.104 

.012 

.005 

.003 

18 

10.944 

.232 

.058 

.129 

.009 

.008 

19 

11.626 

-.210 

.051 

.042 

.002 

.002 

Tabic  3 


Average  Posterior  Variances  and  Standard  Deviations 
for  a  Sample  of  100  Examinees 


Estimation  Method 

Average 

Posterior 

Variance 

Average 

Posterior 

SD. 

% 

Decrease 

Multiple  Imputation 

0.2151 

.4585 

— 

Nonparametric  ERF 

0.1995 

.4418 

3.6 

Fitted  ERF 

0.1977 

.4406 

3.9 

Point  Estimates 

0.1743 

.4113 

10.3 
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Figure  Captions 

Figure  1.  100  Draws  from  Item  Parameter  Posterior  Distributions  for  Four  Items. 
Figure  2.  Item  Response  Functions  for  the  Four  Items. 

Figure  3.  Scatterplots  of  Posterior  Means  and  Standard  Deviations  for  100  Examinees. 
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100  Draws  from  Item  Parameter  Posterior  Distributions  for  Four  Items 
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Item  Response  Functions  for  the  Four  Items 


Appendix  A 

Pseudolikelihood  Estimation  of  dfrom  Marginalized  Likelihoods 


The  first  section  below  paraphrases  Arnold  and  Strauss’s  (1991;  denoted  AS 
below)  framework  and  results  on  pseudolikelihood  estimation.  The  reader  is  referred  to 
AS  for  regularity  conditions,  proofs,  and  examples.  The  second  section  shows  how  this 
framework  accommodates  likelihood  estimation  of  0  using  the  product  of  expected 
response  curves. 

Pseudolikelihood  Estimation 


Let  (Xi,...,X#)  represent  N  iid  /t-dimensional  observations  with  common  joint 
density  f(*;0)  where  0is  an  element  of  a  p -dimensional  parameter  domain  0.  Denote  by  S 
the  set  of  all  «-dimensional  vectors  consisting  of  0’s  and  1  ’s,  with  at  least  one  1.  For  a 
particular  s  in  S,  the  random  vector  Xfo)  contains  the  coordinates  X,y  of  X/  for  which 
Sj=  1.  For  example,  if  X,-  =(X,i,  Xq,  X,3>  and  $=(1,0,1),  then  Xfi)=(Xn,  Xg).  The 
density  of  X/J)  will  be  denoted  f,(jc(,);  0),  although  it  may  depend  on  only  some  of  the 

components  of  0.  Let  S=  {<5,: s  e  S}  be  a  vector  of  2M  real  numbers,  not  all  zero, 
corresponding  to  the  elements  of  S.  The  pseudolikelihood  PL(5,0)  of  the  data  is  defined 
by 


PL(«,e)=n 


seS  L 


(Al) 


Equivalently,  in  terms  of  logarithms, 

log  PL(5,0)  =  log  f,(*,(,);0). 

jcS  i=l 


A  pseudolikelihood(^)  estimate  of  0  is  a  value  of  0that  maximizes  (Al).  Under 
regularity  conditions,  (A  1)  can  be  maximized  by  solving  the  pseudolikelihood  equations, 
obtained  by  differentiating  the  log  of  the  pseudolikelihood  with  respect  to  the  elements  of  0 
and  setting  them  to  zero;  that  is, 


-i-iogPUMHXs.i^ 

*sS  «=1 


r,w;0) 


=  0  fork  =  l,...,p. 


(A2) 
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If  regularity  conditions  given  in  AS  for  f  and  the  f/s  are  satisfied,  then  with 
probability  tending  to  1  as  N  ->  <»  the  pseudolikelihood  equation  (A2)  has  a  root  0Nsuch 
that  6n  — — >  60,  the  true  parameter  value;  i.e.,  the  pseudolikelihood  estimator  is 
consistent.  (The  regularity  conditions  ensure,  among  other  things,  that  the  choice  of  8  does 
not  omit  any  elements  of  a  multidimensional  6  from  PL (8,0).)  Moreover,  the 
pseudolikelihood  estimator  is  asymptotically  normal.  AS  give  an  expression  for  its  large- 
sample  variance,  which  depends  on  the  choice  of  8  and  is  bounded  from  below  by  the 
large-sample  variance  of  the  MLE.  In  the  univariate  case,  any  consistent  sequence 
dN  =  0N(X1,...,XN)  of  roots  of  (A2)  satisfies 


where 

K  '(»)=  2$ s ■■  E> 
and 

j5(0) = s, 

Application  to  Expected  Response  Curves 

The  above  results  can  be  applied  to  the  estimation  of  examinee  ability  under  an  ERT 
model.  Let  X=(Xi, . . .,  Xn)  represent  a  response  vector  from  an  examinee  to  n  items, 
governed  by  the  IRT  model  F;(0)  s  P(Xy  =  II 0,Pj)  with 

P(X = *e,B) = n[F, (*)]"[!  -  F/(e)p  • 

Let  knowledge  about  B  be  expressed  as  p(B).  The  marginalized  likelihood  function  for 
maximum  likelihood  estimation  of  6  is 

P(X  =  *l0)=jf n[F, (*)]"[!  •  F, .(»)]'■’'  |p(B)5B. 
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For  pseudolikelihood  estimation,  define  S  as  a  selector  for  the  subspace  of  S 
consisting  of  vectors  that  isolate  a  single  item  response;  i.e., 

fi  if  2>,=1 
5s=i  i 

0  otherwise 

The  pseudolikelihood  PL (S,6)  corresponding  to  one  observed  response  vector  (i.e.,  N—l) 
is  obtained  by  specializing  (Al)  as  follows: 

PL(M)=n[f.(*w;e)f 

MS 

=flp(-¥9) 

y»i 

y'-i 

where  F*(ft)  is  the  expected  response  curve  for  Item  j. 

If  knowledge  about  items  is  independent — i.e.,  pCB^npC/J,) — then  the  asymptotic 
variance  of  the  pseudolikelihood  estimate  (A3)  simplifies  to  the  usual  inverse  of  the  sum 
Fisher  information  over  items,  as  calculated  with  expected  response  curves. 

The  AS  consistency  results  imply  the  asymptotic  equivalence  of  maximizing  values 
of  the  full  marginal  likelihood,  which  does  take  dependencies  among  parameters  from 
different  items  into  account,  and  the  product  cf  the  expected  response  curves,  which  does 
not,  for  large  samples  of  response  vectors  for  the  same  ft  Since  we  typically  observe  only 
one  response  vector  per  examinee  in  practical  work,  small-sample  behavior  remains  to  be 
examined. 


Appendix  B 

Program  Documentation 

This  appendix  provides  detailed  documentation  for  two  computer  programs:  EXPRESFN 
and  PLOTTRF.  Hie  EXPRESFN  program  computes  EXPected  RESponse  FuNctions,  both 
nonparametric  and  fitted,  for  a  set  of  items,  given  a  set  of  multivariate  normal  item  parameter 
posterior  distributions  specified  in  terms  of  a  set  of  mean  vectors  and  an  associated  set  of 
independent  variance-covariance  matrices.  The  PLOTIRF  program  provides  plots  of  all 
estimated  curves. 

The  EXPRESFN  Program 

The  EXPRESFN  program  assumes  that  item  responses  may  be  modeled  using  a  2PL  or  a 
3PL IRT  model.  Both  nonparametric  and  fitted  expected  response  functions  are  estimated  for 
all  items.  The  procedures  used  to  estimate  the  fitted  expected  response  functions  are  very 
similar  to  the  procedures  employed  in  LOGIST.  The  program  also  computes  EAP  ability 
estimates  and  standard  errors  for  a  set  of  examinees  using  the  nonparametric  and  fitted 
expected  response  functions  as  well  as  the  point  estimates  of  the  item  parameter  means. 

The  program  has  the  following  options: 

1.  The  user  may  specify  either  a  2PL  or  a  3PL  model 

2.  The  input  point  estimates  of  the  item  parameter  means  and  variance-covariance 
matrices  may  be  specified  on  the  (a,b,c)  scale  or  on  the  transformed  (log(a),bJogit(c))  scale. 

3.  The  range  of  the  0  grid  and  the  total  number  of  grid  points  may  be  specified. 

4.  In  computing  the  fitted  expected  response  function,  the  weighting  distribution  may  be 
either  normal  or  rectangular  and  the  sum  of  the  weights,  ie.  the  total  number  of  pseudo¬ 
examinees,  may  be  specified. 

5.  In  estimating  the  item  parameters  for  the  fitted  expected  response  functions,  the 
iterative  procedure  requires  initial  item  parameter  estimates.  The  program  supplies  default 
values  for  these  initial  estimates.  However,  the  user  may  set  all  initial  a’s  to  a  given  value, 
all  initial  c’s  to  a  specified  value  or  may  supply  the  initial  values. 

6.  To  control  the  problem  of  estimating  c’s  when  the  fitted  expected  response  function 
becomes  asymptotic  below  the  minimum  ability  of  interest,  one  may  fix  the  c’s  at  a  common 
c  for  items  where  the  estimated  b-2/a  is  less  than  some  criterion,  fix  all  c’s  at  a  common  c, 
put  a  beta  prior  on  the  c’s  and  estimate  the  mean  of  the  prior,  or  put  a  beta  prior  on  the  c’s 
fixing  the  mean  at  a  value  specified  by  the  user.  The  common  c  may  be  fixed  or  estimated. 

7.  Abilities  may  be  estimated  for  an  existing  set  of  item  responses  or  for  a  set  of 
responses  generated  by  the  program  for  a  random  sample  of  examinees  drawn  from  either  a 
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normal  or  rectangular  distribution.  The  generated  data  can  be  used  to  assess  the  differences 
between  die  abilities  estimated  using  the  three  item  response  functions. 


Nsa 


•ilLt'-LUirdi 


ic  Expected  Response  Function 


The  nonparametric  expected  response  function  estimation  procedure  requires  point 
estimates  of  the  item  parameters  and  associated  variance-covariance  matrices  expressed  on  a 
transformed  scale.  If  die  input  data  has  not  already  been  transformed,  then  the  following 
transformations  will  be  applied: 


a*j  =  log(ap 

b'j-b, 

Cj  =  log(  c/O-cp) 

var(a‘p  =  varCap/Ca^) 

covCa'j.bp  =  cov(a,.bp/aj 

cov(aj,c*p  =  cov(a,,cp/(afi(l-cp) 

var(b*p  =  var(bj) 

cov(bj,c‘p  =  cov(bj,cp/(Cj(l-<p) 

varfc'p  =  varfcp/Ccjfl-cp)2 


A  grid  of  M  0  values  are  specified  from  0.*  to  O^.  Then  a  random  sample  of  K  parameter 
values  are  drawn  from  die  multivariate  normal  distribution  with  means  a  j,  bj,  c*j  and  with  the 
transformed  variance-covariance  matrix,  27(j3p.  If  the  point  estimate  of  Cj  is  0,  the  Cj  is  held 
fixed  and  only  log  aj  and  bj  sampled.  The  Cj  for  this  item  will  also  not  be  estimated  for  the 
fitted  ERF.  If  the  point  estimate  for  Cj  is  less  than  or  equal  to  .001,  the  mean  for  cj  used  for 
the  multivariate  normal  is  set  to  the  standard  error  of  c.  F*j(0-)  is  computed  for  each  of  the 
M  values  of  8  for  each  of  the  K  IRF’s.  F**,  is  the  average  of  die  F*j(0,P’s. 
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Fitted  Ei 


Response  Function 


The  nonparametric  ERF  is  the  input  for  estimating  the  parameters  of  the  fitted 
expected  response  function.  The  abilities  are  fixed  at  the  0.  values  in  the  grid.  A  sample  of 
pseudo-examinees  is  generated  to  weight  the  grid  values  according  to  a  weighting  distribution 
specified  by  the  user.  The  distribution  may  be  either  normal  or  rectangular.  If  normal  the 
user  may  specify  the  mean  and  standard  deviation.  The  user  specifies  the  number  of 
examinees  for  the  sample.  Newton’s  method  is  used  to  solve  for  the  corrections  to  the 
estimated  parameters  by  solving  the  likelihood  equations.  Since  there  are  no  omits,  this 
procedure  uses  the  expected  values  of  the  second  derivatives  which  removes  any  possibilities 
of  nonpositive  definite  matrices.  If  an  item  has  a  zero  determinant,  the  item  is  removed  from 
further  estimation  and  the  parameters  are  set  to  the  values  before  die  zero  determinant 


The  iteration  procedure  requires  initial  values  for  the  item  parameters.  The  default 
value  for  a  is  one.  The  default  value  for  c  is  1/C#  choices)  -.05.  The  default  value  for  b  is  a 
function  of  the  proportion  correct  The  formulas  to  compute  the  default  values  of  b  are: 


where  hj  is  given  by  the  following  equations 

-  £ 

/«■’<* 
»> 

and 


M 


£  WQJ 


and  N  is  the  number  of  pseudo-examinees. 

The  procedure  estimates  the  parameters  for  one  item  at  a  time  until  the  relative  change 
in  a  is  less  than  .001  if  a  is  being  estimated.  If  a  is  fixed,  the  procedure  iterates  until  the 
change  in  b  is  less  than  .001.  One  pass  through  all  of  the  items  constitutes  a  stage.  In  the 
first  stage  the  c’s  are  held  fixed.  In  the  second  and  following  stages  the  c’s  are  estimated 
unless  a  two  parameter  model  is  requested.  If  all  c’s  are  being  estimated,  or  there  is  a  prior 
on  the  c’s,  stages  are  repeated  until  the  change  in  die  likelihood  is  less  than  .02%  between 
stages. 


If  no  prior  is  imposed  on  the  c’s  and  the  poorly  estimated  c’s  are  restricted  to  a 
common  c  value,  the  following  procedure  is  used: 

In  the  second  and  third  stages  the  c’s  for  all  items  are  estimated. 
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At  the  end  of  the  third  stage,  the  c’s  for  items  with  b-2/a  less  than  the  criterion  for 
fixing  the  c,  (CRITFIXQ,  are  fixed  at  a  common  c  value.  If  all  c’s  are  to  be 
fixed  at  a  common  c  value,  they  are  set  to  the  common  c  value  at  this  point 

The  common  c  value  is  then  estimated  once  per  stage  until  the  change  in  the  common 
c  is  less  than  the  standard  error  of  die  common  c  estimate  for  two  successive 
stages.  Only  the  items  with  c  fixed  at  the  common  c  are  estimated  in  these 
stages. 

The  common  c  is  then  fixed  and  all  items  are  again  estimated  until  the  criterion 
function  increases  by  less  than  .02% 

If  a  prior  on  c  is  requested  and  the  mean  is  estimated,  the  mean  is  computed  as  the 
average  of  die  c’s  at  the  end  of  each  stage.  Note:  the  beta  prior  is  included  in  the 
computation  of  the  likelihood  and  since  the  mean  isn’t  actually  a  maximum  likelihood 
estimate  of  the  mean,  the  likelihood  may  not  increase  uniformly.  To  prevent  premature 
stopping  of  the  estimation  procedure  in  this  situation,  the  procedure  will  continue  until  the 
maximum  difference  between  IRFs  between  stages  is  less  than  .001.  The  difference  is 
computed  for  5  abilities  from  -2  to  2  at  intervals  of  1. 

The  a  parameter  is  restricted  to  a  range  of  .01  to  99,  c  to  a  range  of  0.  to  .99.  The 
maximum  amount  that  a  parameter  may  change  in  any  iteration  is  restricted.  The  amount  for 
a  is  .1  times  the  previous  value  for  a  plus  .2,  b  is  .1  times  previous  value  of  b  plus  .4,  and  c 
is  .06. 

Input 

The  input  to  the  program  consists  of  a  sysin  file  containing  file  names  for  the  input 
and  output  files  and  parameters  for  controlling  the  procedure  and  a  file  containing  the  point 
estimates  for  the  parameters  and  the  variance-covariance  of  these  estimates.  If  abilities  are  to 
be  estimated  for  a  group  of  examinees,  the  file  of  their  responses  is  also  read. 

The  Svsin  Hie. 

Record  Set  1: 

The  first  set  of  records  in  the  sysin  file  define  the  input  and  output  files. 

The  set  contains  one  record  for  each  file  to  be  defined.  The  last  record  in  this  set  must  be 
blank.  The  format  for  the  file  definition  card  is: 

col  1  F 

col  3  -  4  Unit  number 

col  6  -  45  Hie  name,  with  all  qualifiers 
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The  files  to  be  specified  are: 
Input  files: 


Unit  5 
Unit  10 


Unit  11 

Output  files: 


File  containing  the  sysin  dataset 
File  containing,  for  each  item,  the  point  estimates  and  the 
variance-covariance  matrix.  They  may  be  either  on  the  a,b,c 
scale  or  on  the  log  a,  b,  logit  c  scale  but  both  the  point  estimates 
and  the  variance-covariance  matrix  must  be  on  the  same  scale. 
Input  file  containing  the  examinee  responses  if  abilities  are  to  be 
estimated  for  an  existing  item  response  file. 


Unit  6  Printed  output  file 

Unit  7  Item  parameter  output  in  LOGIST7  format  The  abilities  written 
are  the  pseudo-abilities  used  to  estimate  the  fitted  ERF’s. 

Unit  12  Binary  scratch  output  file,  used  to  temporarily  store  the 
nonparametric  ERF’s  and  then  the  examinee  responses. 

Unit  13  Output  file  containing  die  point  estimate  item  parameters,  the 
fitted  ERF,  and  the  nonparametric  ERF  for  each  item. 

Unit  14  Output  file  containing  die  sample  of  item  response  functions,  if 
it  was  requested  that  die  sample  be  saved. 

Unit  IS  Output  file  containing  ability  estimates,  standard  errors,  and  item 
responses,  if  abilities  are  estimated. 


Record  Set  2. 

In  record  set  2,  the  options  for  running  the  procedure  are  specified.  Only  those 
options  where  the  default  says  "Required"  must  be  specified.  The  required  parameters  are  the 
tide,  the  number  of  items,  the  number  of  choices  per  item,  and  the  format  for  reading  the 
point  estimates  file.  Defaults  are  supplied  for  all  of  the  other  parameters.  The  parameters  are 
specified  by  entering  the  parameter  name  in  positions  1  through  11  of  the  record  and  the 
value  in  positions  13  through  20.  Formats  are  entered  in  positions  13  -  80.  Right  justify  all 
integer  values.  The  last  record  in  this  set  must  be  blank. 


Parameter  input: 


Parameter 

Description  /  Options 

Default 

TITLE 

Titl*.  for  the  run 

required 

#ITEMS 

Number  of  items.  (Maximum  800) 

required 

SEED 

Random  number  seed.  Integer  between  0  and 
1048576. 

275927 

DEBUG 

Debugging  printout? 

NO 

ITEMIDEN 

Read  in  8-character  item  identification  codes? 

NO 
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Parameter  Description  /  Options  Default 

GENFIXC  Is  c  fixed  in  var/cov  Le.  var/cov  for  c  are  0?  If  so  c  NO 

will  be  fixed  in  fitting  the  ERF. 

IFTRANS  Are  the  input  point  estimates  and  var/cov  matrix  on  NO 

the  log  a,  b,  logit  c  scale? 

FMTVAR  Format  for  reading  point  estimates  and  var/cov  Required 

matrix.  The  values  are  read  in  following  order: 
item  number,  a,  b,  c,  var(a),  cov(a,b),  cov(a,c), 
var(b),  cov(b,c),  var(c).  If  abilities  are  to  be 
estimated  for  a  group  of  examinees,  the  item 
number  must  be  the  sequence  number  of  the  item  in 


the  record  of  item  responses. 

#SAMPIRF  Number  of  item  parameter  values  to  sample  100 

(Maximum= 1 ,000) 

MINTHETA  Minimum  ability  for  6  grid  -3. 

MAX1HETA  Maximum  ability  for  O  grid  3. 

#ABDLGRP  Number  of  points  in  0  grid.  (Maximum  201)  31 

WEIGHTFN  Weighting  distribution  for  fitting  ERF.  Enter  NORMAL 

RECTANGULAR  or  NORMAL 

WEIGHTMN  If  weighting  distribution  NORMAL,  specify  mean  0. 

WEIGHTSD  If  weighting  distribution  NORMAL,  specify  1. 

standard  deviation. 

#ERFEXAM  Number  of  pseudo  examinees  for  estimating  the  3100 


fitted  ERF’s.  These  will  be  apportioned  by  the 
weighting  distribution  to  the  M  0  grid  points  and 


adjusted  so  that  there  is  an  integral  number  of 
examinees  at  each  grid  point 

SAVES  AMP  Save  the  sample  of  item  response  functions  to  a  NO 

file? 

READA  Read  in  initial  a’s?  NO 

READB  Read  in  initial  b’s?  NO 

READC  Read  in  initial  c’s?  NO 
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Parameter 

PRIORC 


CRTTFIXC 

AINTT 

AMAX 

PARMCODE 


CHOICESx 


CINITx 

COMCx 


Description  /  Options  Default 

prior  on  c?  0 

0  -  no,  estimate  all  c’s,  don't  fix  any  at  the  common 
c  value. 

1  -  no,  fix  c’s  at  a  common  c  (COMCx)  if 

b-2/a<CRITFIXC.  Estimate  COMCx. 

2  -  no,  fix  all  items  at  a  common  c.  Estimate 
COMCx. 

3  -  yes,  estimate  the  mean  of  prior. 

4  -  yes,  fix  the  mean  of  prior. 


Criterion  for  fixing  c,  if  no  prior  requested  and  -2.3 
PRIORC  =  1. 

Initial  a  value,  if  READA  is  NO.  1. 

Maximum  a.  99.0 

What  parameters  are  to  be  estimated  3 

-1  -  read  in  parmcode  for  each  item 


Otherwise  set  parameter  code  for  all  items  to  the 
specified  code.  The  definitions  of  the  codes  are: 

code  parameters 
estimated 

2  a,b 

3  a,b,c 

Number  of  choices  per  item,  x  indicates  a  sequence  Required 
number  for  different  item  types.  Specify  a  different 
CHOICESx  for  each  item  type.  For  example,  if  a 
test  has  4  and  3  choice  items,  set  CHOICES  1  to  4 
and  CHOICES 2  to  3.  x  must  be  between  0  and  98. 

Initial  c  for  the  CHOICESx  items.  1/CHOICESx  -.05 

If  no  prior  on  c  ,  common  c  value  for  the  1/CHOICESx  -.05 

CHOICESx  items. 

If  prior  on  c,  mean  c  of  prior  for  the  CHOICESx 
items. 


Parameter 

N-INFx 


Default 
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CHIx 

ESTABIL 

#EXAMLNEE 

PRIORMN 

PRIORSD 

GENRESP 

DISTABIL 

DISTMN 

DISTSD 

RECTMIN 

RECTMAX 

FMTRESP 


Description  /  Options 

This  is  only  used  if  there  is  a  prior  on  c.  It  is  the 
weight  for  the  prior  on  c  in  terms  of  the  number  in 
a  hypothetical  group  of  examinees  at  minus  infinity. 
It  controls  the  variance  of  the  beta  prior.  A 
separate  N-INFx  must  be  specified  for  every 
CHOICESx  alternatives. 

Maximum  c 

Estimate  abilities? 

Number  of  examinees  for  which  abilities  are  to  be 
estimated  if  EST ABIL= YES .  (Maximum  10,000) 

Prior  mean  of  p(0) 

Prior  standard  deviation  of  p(6) 

Generate  artificial  data,  abilities  and  item  responses. 

If  generating  artificial  data,  specify  type  of  ability 
distribution  to  generate,  either  'RECTANGULAR’ 
or  ’NORMAL’. 

If  DISTABIL  is  ’NORMAL’,  specify  the  mean  of 
the  distribution. 

If  DISTABIL  is  ’NORMAL’,  specify  the  standard 
deviation  of  the  distribution. 

If  DISTABIL  is  ’RECTANGULAR’,  specify 
minimum  ability  for  distribution. 

If  DISTABIL  is  ’RECTANGULAR’,  specify 
maximum  ability  for  distribution. 

If  reading  in  examinee  responses,  specify  format  for 
reading  the  item  responses.  They  will  be  selected 
as  specified  by  item  number  read  from  the  point 
estimates  file.  They  are  read  in  integer  format  As 
many  integer  fields  must  be  specified  as  the 
maximum  item  number  read  from  the  point 
estimates.  For  example,  if  tire  item  numbers  read 
from  the  point  estimates  are  1,5,  and  10.  The 
format  must  specify  reading  in  10  integer  fields. 


20 


.99 

NO 

20 

0. 

1. 

YES 

RECTANGULAR 

0. 

1. 

-3. 

3. 

Required  if 

ESTABEL=YES 

and 

GENRESP=NO. 
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Additional  input: 

If  PARMCODE  =  -1,  read  in  a  parameter  code  for  each  item  with  Record  set  3. 

If  more  than  one  CHOICESx  read,  specify  the  items  for  each  number  of  choices  in 
Record  set  4. 

If  ITEMIDEN  requested,  read  in  item  identification  in  Record  set  5. 

Record  set  3. 

This  record  set  is  only  required  if  PARMCODE  is  set  to  -1  to  read  in  a  parameter 
code  for  each  item, 
col  1  -  8  "PARMCODE" 

col  9-10  Sequence  number  for  this  PARMCODE  record, 
col  1 1-80  Parameter  codes  for  the  items  in  3512  format. 

Repeat  for  as  many  records  as  necessary,  increasing  the  sequence  number  for  each 
record.  For  example,  for  items  36-40,  the  sequence  number  must  be  2. 

Record  set  4. 

This  record  set  is  only  necessary  if  more  than  one  CHOICESx  is  specified.  It  is  used 
to  specify  the  number  of  choices  for  each  item, 
col  1  -  8  "CHOICESx"  where  x  corresponds  to  the  CHOICESx  specified  on  the 
parameter  records. 

col  9  -10  Sequence  number  for  this  CHOICESx  record. 

col  11  -  80  Item  numbers  of  the  items,  that  have  the  number  of  choices  specified  by 
CHOICESx,  read  in  (1015)  format  A  sequence  of  items  can  be 
specified  by  specifying  the  first  number  in  the  sequence  followed  by 
the  negative  of  the  last  number  in  the  sequence. 

Enter  as  many  CHOICESx  records  as  necessary,  increasing  the  sequence  number  for 
each  record.  Do  no  split  a  sequence  across  two  records.  If  the  beginning  of  a 
sequence  would  be  the  last  field  of  a  record,  leave  the  last  field  blank  and  start 
the  sequence  on  the  next  record. 

Record  set  5. 

If  ITEMIDEN  is  "YES",  this  set  is  required  to  read  in  the  8-character  item 
identification  for  each  item, 
col  1  -  8  "ITEMIDEN" 

col  9  -  10  Sequence  number 

col  11  -  18  Item  identification  for  the  first  item.  Left  justify  the  identification  in  the 
field. 

col  19  - 10  Blank 

col  21  -  28  Item  identification  for  the  second  item, 

col  29  -  30  Blank 

etc.  etc. 

Enter  7  item  identifications  per  record,  repeat  for  as  many  records  as  necessary, 
increasing  the  sequence  number  for  each  record.  For  example,  record  with 
sequence  number  2  will  contain  the  identifications  for  items  8  through  14. 
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Detailed  description  of  OUtPUE 

Unit  6  Printed  output  file 
The  printout  contains: 

Check  on  input  parameters  and  defaults. 

For  the  nonparametric  ERF,  the  point  estimates,  die  input  var/cov  matrix,  the 
var/cov  for  the  sampled  IRF’s  for  both  the  a,b,c  scale  and  the 
transformed  scale,  and  the  nonparametric  ERF  for  a  spaced  sample  of 
the  9  grid  points  are  printed. 

For  the  estimation  of  the  parameters  for  the  fitted  ERF,  the  likelihood  is 

printed  for  each  stage  as  well  as  the  maximum  derivatives  for  the  three 
parameters,  die  maximum  change  in  an  iteration,  and  the  maximum 
change  over  all  iterations  for  each  type  of  parameter.  If  the  common  c 
is  being  computed,  information  on  die  computation  of  the  common  c 
values  is  printed. 

For  each  item  there  is  a  parameter  code  that  indicates  which  item  parameters 
are  being  estimated.  The  values  for  die  codes  are  defined  in  the  input 
description,  hi  addition,  a  20  is  added  to  die  code  if  the  c  for  an  item 
is  held  fixed  at  die  common  c.  If  an  item  is  removed  because  the 
expected  matrix  of  second  derivatives  had  a  zero  determinant,  the 
parameter  code  is  set  to  996. 

The  final  item  parameter  estimates  are  printed  as  well  as  the  standard  errors  of 
the  stimates. 

If  abilities  are  estimated,  the  EAP  ability  estimates  and  the  standard  errors  are 
printed  for  the  point  estimate  IRF,  die  nonparametric  ERF  and  the  fitted 
ERF.  Only  the  first  and  last  10  are  printed. 

Unit  7  Item  parameter  output  in  LOGIST7  format  The  abilities  written  are  the 

pseudo-abilities  used  to  estimate  the  fitted  ERF’s.  A  subroutine  to  read  this 
file  is  included  with  the  program.  The  subroutine  contains  comment  statements 
that  describe  the  calling  arguments.  Output  includes  the  title,  the  number  of 
items,  the  number  of  pseudo-examinees,  the  estimated  item  parameters,  the 
pseudo-abilities,  variables  used  in  the  estimation  of  c,  and  parameter  code 
indicator  for  number  of  parameters  estimated. 

Unit  13  Hie  containing  the  nonparametric  item  response  functions  for  plotting  with  the 
plot  program.  The  first  record  contains  the  title  of  the  run.  The  second  record 
contains  the  number  of  items  (15).  The  third  record  contains  the  M  abilities  for 
the  9  grid  in  the  format  (5X,10F8.4).  The  remaining  records  contain  the  item 
sequence  number,  the  ’tern  number,  the  item  identification,  the  a,b,c  point 
estimates,  a,b,c  estimates  for  the  fitted  ERF,  the  parameter  code, and  the 
nonparametric  proportion  correct  for  the  M  abilities  in  the  format 
(2I5^8,1X3F12.6,1X,3F12.6J4/(10F12.6» 
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Unit  14  Output  file  containing  the  sample  of  item  response  functions,  if  it  was 

requested  that  it  be  saved.  For  each  item,  the  item  number  and  the  three 
parameters  for  each  sampled  ERF  are  written  in  the  format 
(I4,12F12.6/(4X,12F12.6)). 

Record  1:  col  1  -  4:  Item  number 

col  5  -  16:  a  for  first  item  sampled 
col  17  -28:  b  for  first  item  sampled 
col  29  -  40:  c  for  first  item  sampled 
col  41  -  52:  a  for  second  item  sampled 
etc.  etc. 

Unit  15  Output  file  containing  ability  estimates  and  standard  errors,  and  item  responses, 
if  abilities  are  estimated. 

For  each  examinee  a  record  is  written  in  the  format  (I5,7F12.6,600I1) 
containing: 

col  1-5:  examinee  sequence  number 

col  6  -  17  -  true  ability,  (if  responses  are  read,  this  is  set  to  999999.) 
col  18  -  29  -  EAP  ability  computed  using  point  estimate  IRF 
col  30  -  41  -  EAP  ability  computed  using  fitted  ERF 
col  42  -  53  -  EAP  ability  computed  using  nonparametric  ERF 
col  54  -  65  -  Standard  error  of  ability  computed  using  point  estimate 
IRF 

col  66  -  77  -  Standard  error  of  ability  computed  using  the  fitted  ERF 
col  78  -  89  -  Standard  error  of  ability  computed  using  nonparametric 
ERF 

col  90  +  Item  responses  in  II  format,  items  1  to  #ITEMS. 


The  PLOTERF  Program 

A  plot  program  was  also  developed  that  plots  die  three  item  response  functions  for 
comparison  of  the  three  curves.  This  program  produces  plots  on  the  screen,  a  laser  printer,  or 
a  postscript  printer.  Input  to  the  program  consists  of  a  sysin  file  with  the  control  parameters 
and  the  file  written  on  die  unit  13  by  the  EXPRESFN  program.  One,  four  or  eight  plots  per 
page  are  possible. 

Input 

The  sysin  file  consists  of  a  set  of  records  defining  the  input  and  output  files  and  a  few 
control  parameters. 

Record  set  defining  files. 

The  set  contains  one  record  for  each  file  to  be  defined. 

The  last  record  in  this  set  must  be  blank. 
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The  format  for  the  file  definition  card  is: 

col  1  F 

col  3  -  4  Unit  number 

col  6  -  45  File  name,  with  all  qualifiers 

The  files  to  be  specified  are: 

Input  files: 

Unit  5  Sysin  file  containing  file  definitions  and  parameters. 

Unit  13  Hie  written  on  unit  13  in  EXPRESFN  containing  the  nonparametric 

item  response  functions. 

Output  file: 

Unit  9  Plot  output  if  requested  that  the  plots  be  saved  for  printing  later. 

Record  set  specifying  control  parameters. 

The  last  record  in  this  set  must  be  a  Hank  record. 

Parameter  Description/options  Default 

TITLE  Tide  for  plots.  Tide  from 

EXPRESFN. 

IFSEL1T  Select  items  from  items  in  NO 

EXPRESFN  run. 

PLOTDEV  Plotting  device:  LASER 

POSTSCRIPT 

LASER  -  HP  laser  printer 

SCREEN  -  only  display  on  screen. 

#PLOTPAGE  Number  of  plots  per  page.  Options  arc  8 

1, 4,  or  8. 

SAVEPLOT  Plot  now  or  write  plots  to  file?  NO 

NO  -  print  plots  now 
YES  -  save  plots  to  a  file  for 
printing  later. 

Record  set  3. 

If  IFSELIT  is  YES  to  select  items  from  the  EXPRESFN  run,  specify  the  items  to 
select  with  this  record  set 
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The  format  of  record  set  3  is  as  follows: 
col  1  -  8  -IFSELTr 

col  9  -10  Sequence  number  for  this  IFSELIT  record, 
col  11  - 13  Item  number  of  first  item  to  be  selected, 

col  16  -  20  Item  number  of  second  item  to  be  selected, 

etc.  etc. 

col  76  -  80  Item  number  of  14th  item  to  be  selected. 

Indicate  a  sequence  of  item  numbers  by  entering  the  first  in  the  sequence  and  the  negative  of 
the  last  in  the  sequence.  Repeat  for  as  many  cards  as  necessary.  Increase  the  sequence 
number  for  each  card.  Do  not  split  a  sequence  across  two  records.  If  the  beginning  of  a 
sequence  would  be  the  last  field  of  a  record,  leave  the  last  field  blank  and  start  the  sequence 
on  the  next  record. 
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